generalized integrated gradient
Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG)
Kim, Yearim, Han, Sangyu, Han, Sangbum, Kwak, Nojun
In the field of eXplainable AI (XAI) in language models, the progression from local explanations of individual decisions to global explanations with high-level concepts has laid the groundwork for mechanistic interpretability, which aims to decode the exact operations. However, this paradigm has not been adequately explored in image models, where existing methods have primarily focused on classspecific interpretations. This paper introduces a novel approach to systematically trace the entire pathway from input through all intermediate layers to the final output within the whole dataset. We utilize Pointwise Feature Vectors (PFVs) and Effective Receptive Fields (ERFs) to decompose model embeddings into interpretable Concept Vectors. Then, we calculate the relevance between concept vectors with our Generalized Integrated Gradients (GIG), enabling a comprehensive, dataset-wide analysis of model behavior. In the field of eXplainable AI (XAI), efforts have historically transitioned from Local explanation to Global explanation to Mechanistic Interpretability. While local explanation methods including Selvaraju et al. (2016); Montavon et al. (2017); Sundararajan et al. (2017); Han et al. (2024) have focused on explaining specific decisions for individual instances, global explanation methods seek to uncover overall patterns and behaviors applicable across the entire dataset (Wu et al., 2022; Xuanyuan et al., 2023; Singh et al., 2024).
Introducing Generalized Integrated Gradients (GIG): A Practical Method for Explaining Diverse Ensemble Machine Learning Models - KDnuggets
Machine learning is proven to yield better underwriting results and mitigate bias in lending. But not all machine learning techniques, including the wide swath at work in unregulated uses, is built to be transparent. Many of the algorithms that get deployed generate results that are difficult to explain. Recently, researchers have proposed novel and powerful methods for explaining machine learning models, notably Shapley Additive Explanations (SHAP Explainers) and Integrated Gradients (IG). These methods provide mechanisms for assigning credit to the data variables used by a model to generate a score.
Generalized Integrated Gradients: A practical method for explaining diverse ensembles
Merrill, John, Ward, Geoff, Kamkar, Sean, Budzik, Jay, Merrill, Douglas
We introduce Generalized Integrated Gradients (GIG), a formal extension of the Integrated Gradients (IG) (Sundararajan et al., 2017) method for attributing credit to the input variables of a predictive model. GIG improves IG by explaining a broader variety of functions that arise from practical applications of ML in domains like financial services. GIG is constructed to overcome limitations of Shapley (1953) and Aumann-Shapley (1974), and has desirable properties when compared to other approaches. We prove GIG is the only correct method, under a small set of reasonable axioms, for providing explanations for mixed-type models or games. We describe the implementation, and present results of experiments on several datasets and systems of models.